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Abstract 

A research activity entitled computational struc- 
tural mechanics (CSM) at the NASA Langley Re- 
search Center is described. This activity involves 
developing advanced structural analysis and compu- 
tational methods that exploit high-performance com- 
puters. New methods are developed in the framework 
of the CSM Testbed software system and applied to 
representative complex structural analysis problems 
from the aerospace industry. An overview of the 
CSM Testbed methods development environment is 
presented and some new numerical methods devel- 
oped on a CRAY-2 computer system are described. 
Selected application studies performed on the NAS 
CRAY-2 computer system are also summarized. 

Introduction 

Research in computational methods for structural 
analysis is encumbered by the complexity and cost 
of the software development. In addition, new com- 
puter architectures with vector and multiprocessor 
capabilities are being manufactured to provide in- 
creased computational power. Analysis and compu- 
tational algorithms that can exploit these new com- 
puter architectures need to be developed. These new 
algorithms should be developed and evaluated in a 
standard, general-purpose, finite-element structural 
analysis software system rather than in an isolated 
research software system so they can be evaluated on 
large-scale application problems as well as on small 
verification problems. 

At the NASA Langley Research Center (LaRC), 
a research effort is being directed towards developing 
advanced structural analysis methods and identify- 
ing the requirements for next-generation structural 
analysis software which will exploit multiple vector 
processor computers (ref. 1). This activity, called 
computational structural mechanics, or CSM (ref. 2), 
has resulted in the development of the CSM Testbed 
software system (e.g., Lotts et al. (ref. 3)) to aid in 
the definition of these requirements and to serve as 
a “proving ground” for new methods for large-scale 
structural application problems. This research activ- 
ity makes extensive use of the computational facilities 
provided by the Numerical Aerodynamic Simulation 
(NAS) Program at the NASA Ames Research Center 
(ref. 4). 

This paper describes the implementation expe- 
riences, the resulting capability, and the future di- 
rections for the CSM Testbed on supercomputers. 
The distributed nature of the computing hardware 
environment is described herein and its use demon- 
strated. The flexibility of the CSM Testbed, coupled 
with the computational facilities available through 


the NAS system, makes it possible for structural an- 
alysts, method developers, numerical analysts, and 
computer scientists to integrate their research in a 
common, shared computing environment. The pow- 
erful, problem-solving capability of this computing 
environment is demonstrated in the solution of sev- 
eral structural application problems involving linear 
and nonlinear stress analysis, buckling analysis, and 
transient dynamics analysis. 

Overview of CSM Testbed 

The field of computerized structural analysis is 
dominated by two types of computer programs. One 
type is the huge, 2000-subroutine, general-purpose 
program (ref. 5) that is the result of over 100 man- 
years of effort spanning more than a decade. The 
other type is the relatively small, special-purpose 
code resulting from a research environment that rep- 
resents a 1- to 2-year effort for a specific research ap- 
plication. This dichotomy has resulted in long delays 
in making research technology available for critical 
structural analysis problems that NASA faces. To 
accelerate the introduction of successful research 
technology into large-scale applications programs, a 
modular, public-domain, machine-independent, ar- 
chitecturally simple software development environ- 
ment has been constructed. This system is denoted 
the CSM Testbed. One goal of the CSM Testbed 
is to provide a common structural analysis environ- 
ment for three types of users — engineers solving com- 
plex structures problems, researchers developing ad- 
vanced structural analysis methods, and developers 
designing the software architecture to exploit multi- 
processor computers. 

The CSM Testbed software system is a highly 
modular and flexible structural analysis system for 
studying computational methods and for exploring 
new multiprocessor and vector computers. The CSM 
Testbed is used by a group of researchers from uni- 
versities, industry, and government agencies. Un- 
restricted access to all parts of the code, including the 
data manager and the command language, is permit- 
ted. Research on these elements of software design is 
needed because deficiencies in the data management 
strategy can have a devastating impact on the per- 
formance of a large structural analysis code, totally 
masking the relative merits of competing compu- 
tational techniques. Furthermore, software designs 
that exploit multiprocessor computers must be devel- 
oped; in particular, techniques for handling parallel 
input/output (I/O) are required. 

The CSM Testbed is public-domain software, and 
source code is available. The initial CSM Testbed, 
called NICE/SPAR, began with the integration of 
the NICE system (refs. 6 and 7) and Level 13 of 


SPAR (ref. 8). Since then, new capabilities and 
improvements have been implemented in the CSM 
Testbed. A brief description of selected CSM Testbed 
processors is given in table 1 . Each step of the 
evolution of the CSM Testbed provides improved 
structural analysis capabilities to structural analysts. 

Distributed Computer Environment 

Distributed computer environments are made up 
of stand-alone computers of different sizes, architec- 
tures, and vendors, with a common network proto- 
col offering the user easy file transfer and remote 
login functions. Structural analysts require the 
diverse computer capabilities offered by a dis- 
tributed environment (workstation-mainframe- 
supercomputer) but cannot afford the “overhead” of 
learning the operating system commands for each 
system they use. Applications developers have a sim- 
ilar problem, but at a lower level. They cannot afford 
the overhead of learning a new set of system calls for 
each computer on which they wish to implement their 
application code. The CSM Testbed, as depicted in 
figure 1. addresses these problems. The inner circle, 
the computer-specific operating system, is provided 
by the computer vendor and is different for each ven- 
dor. The outer ring, the applications development 
environment, insulates both the user and the appli- 
cations developer from those differences by providing 
a consistent interface. The methods development en- 
vironment of the CSM Testbed is described by Gillian 
and Lotts (ref. 9). 

The computing environment of the CSM activ- 
ity is currently a distributed environment, as shown 
in figure 2. Typically, a structural analyst will de- 
velop a finite-element model of the structure either 
by using a preprocessing software system such as 
PATRAN or by using CLAMP (command language 
for applied mechanics processor) for “parameteriz- 
ing” the model. Run streams are the vehicle used to 
perform structural analyses with the CSM Testbed. 
The term run stream most commonly refers to the 
file (or files) of input data and commands used to 
perform a specific analysis, although it may also re- 
fer to input at an interactive session. Run streams 
for the CSM Testbed are usually developed, verified 
on a workstation, and then transferred to the NAS 
CRAY-2 computer system for complete processing. 
Following a successful execution, the computational 
data base may then be “unloaded” (i.e., converted 
from the binary format of the NAS CRAY-2 com- 
puter system to the ASCII format), transferred intact 
to LaRC using the NASnet wide-area network, and 
then “loaded” (i.e., converted from ASCII format to 
the binary format of the desired workstation) back 
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into a computational data base which has the iden- 
tical Testbed library format as on the NAS CRAY-2 
computer system. Finally, postprocessing is done to 
help the structural analyst visualize the computed 
structural response. The sequence of steps just de- 
scribed depicts the computing environment to which 
the structural analyst must adapt in order to exploit 
the full potential of these computing systems. 

To exploit this new computing environment, ex- 
pertise is needed in the areas of computational strate- 
gies, numerical techniques, computer science, and 
communication networks, together with a firm un- 
derstanding of the principles of structural mechan- 
ics. New computing hardware environments, like the 
NAS system, offer the computational power, mem- 
ory, and disk space necessary for routine analysis of 
large structural models. New computing software 
environments, like the CSM Testbed, offer an in- 
tegrated system with data management, a general 
command language, and many different application 
processors features that enable the structural ana- 
lyst to develop new analysis methods and to tailor 
the analysis for specific application needs. 

CSM Testbed Architecture Features 

The CSM Testbed is a Fortran program organized 
as a single, executable file (called a macroproces- 
sor) which calls structural applications modules that 
have been incorporated as subroutines. The macro- 
processor and applications modules interface with 
the operating system for their command input and 
data management functions through a set of com- 
mon “architectural utilities.” Processors access the 
Testbed utilities by calling entry points implemented 
as Fortran- 77 functions and subroutines which are 
available to module developers in the Testbed object 
libraries. Applications processors do not communi- 
cate directly with each other, but instead commu- 
nicate by exchanging named data objects in a data 
base managed by a data manager called GAL (global 
access library). The user controls the execution of ap- 
plications processors using an interactive, or batch, 
command run stream written in a command language 
for applied mechanics processors (CLAMP), which is 
processed by the command language interpreter pro- 
gram (CLIP). 

Command Language 

The Testbed command language CLAMP is a 
generic language originally designed to support the 
NICE system and to offer program developers the 
means for building problem-oriented languages 
(ref. 10). It may be viewed as a stream of free- field 
command records read from an appropriate com- 
mand source (the user's terminal, actual files, or 




Table 1. Selected CSM Testbed Processors 


Processor 

Description 

ELD 

Element definition (connectivity, material properties, etc.) 

LAU 

Laminate analysis utility for 2-D and 3-D elements 

E 

Element-state initiation (builds element information packets) 

EKS 

Computes the element intrinsic stiffness matrices 

TOPO 

Analyzes the finite-element mesh topology and build tables to drive 
assembly and factorization of system matrices 

RSEQ 

Renumbers nodes for minimum fill or minimum bandwidth 

AUS 

Arithmetic utilities 

K 

Assembles unconstrained system stiffness matrix 

M 

Assembles unconstrained system mass matrix 

INV 

Applies constraints and factors assembled system matrix 

SSOL 

Performs forward reduction and back substitution 

BAND 

Factors and solves using profile or banded solvers 

ITER 

Factors and solves using iterative solvers 

KG 

Forms and assembles unconstrained system geometric stiffness matrix 

EIG 

Solves linear algebraic eigenproblems 

ES 

Generic element processor shell 

VEC 

Performs variety of vector algebra operations 


Applications 

development 

environment 



Figure 1. Organization of CSM Testbed software. 
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Figure 2. Distributed computing environment of CSM. 


processor messages). The commands are interpreted 
by a “filter” utility called CLIP, whose function is 
to produce object records for use by its user pro- 
gram. The standard operating mode of CLIP is the 
processor-command mode. Commands are directly 
supplied by the user, retrieved from ordinary card- 
image files or extracted from the global data base, 
and submitted to the running processor. Special 
commands, called directives, are processed directly 
by CLIP; the processor is “out of the loop.” Tran- 
sition from processor-command to directive mode is 
automatic. Once the directive is processed, CLIP 
returns to processor-command mode. Directives are 
used to dynamically change run-environment param- 
eters, to process advanced language constructs such 
as macrosymbols and command procedures, to imple- 
ment branching and cycling, and to request services 
of the data manager. CLIP can be used in this way 
to provide data to a processor as well as to control 
the logic flow of the program through a single input 
stream. All command language directives are avail- 
able to any processor that uses the CLIP-processor 
interface entry points. 

Data Manager 

The data manager within the CSM Testbed was 
derived from the global access library (GAL) con- 
cept developed at the Lockheed Palo Alto Research 
Laboratory (ref. 11). Methods for data management 
in structural analysis programs can be divided into 
three levels of complexity: file systems, file partition 
systems, and data-base systems (ref. 12). Since data- 
base files are subdivided or partitioned into data sets, 
the Testbed data manager is classified as a file par- 
tition manager. To a processor, a GAL data library 
is analogous to a file. It must be opened, written, 
read, closed, and deleted explicitly. The GAL re- 
sides on a direct-access disk file and contains a di- 
rectory structure called a table of contents (TOC), 
through which specific data sets may be addressed. 
Low-level I/O routines access the GAL library file in 
a word-addressable scheme, as described by Felippa 
(ref. 13). The data management system is accessi- 
ble to the user through the command language di- 
rectives and to the running processors through the 
GAL-processor interface. 

The global data base is made up of sets of data 
libraries (GAL’s) residing on direct-access disk files. 
Data libraries are collections of named data sets, 
which are collections of data set records. The data 
library format supported by the Testbed is called 
GAL/82, which can contain nominal data sets made 
up of named records. Some of the advantages to 
using this form of data library are (1) the order 
in which records are defined is irrelevant, (2) the 


data contained in the records may be accessed from 
the command level, and (3) the record data type 
is maintained by the manager, and this simplifies 
context-directed display operations and automatic 
type conversion. 

To provide the efficiency required to process the 
volume of data required for a complex structural 
analysis, all usual overhead associated with Fortran 
has been eliminated. The actual I/O interface be- 
tween the GAL data manager and the UNIX operat- 
ing system is accomplished through a set of block I/O 
routines written in the C programming language. For 
non-UNIX computer systems, this interface is accom- 
plished through a set of assembly language routines 
which are unique to each computer system. 

Interprocessor Control 

The SuperCLIP capability of the Testbed archi- 
tecture performs interprocessor control, which allows 
independent programs which use the Testbed archi- 
tecture facilities (CLIP and GAL) to be executed 
from within a single Testbed run stream. SuperCLIP 
handles the interprocessor CLIP-state preservation 
and restoration so that the CLIP environment is 
maintained across independent program executions. 
These independent programs can be used in conjunc- 
tion with the Testbed macroprocessor, with other in- 
dependent Testbed processors, or entirely alone, as 
appropriate to accomplish the required task. The 
implementation of SuperCLIP is the most complex 
and machine-dependent element of the Testbed archi- 
tecture software. To date, it has been implemented 
under the VAX/ VMS and the UNIX operating 
systems. 

User Interface 

The user may develop run streams using the 
high-level command language CLAMP for a specific 
engineering problem (ref. 10). These run streams 
may contain CLAMP directives and CLAMP pro- 
cedures which are processed by the command lan- 
guage interpreter CLIP. Applications processors are 
called using the [XQT command, or the GAL (e.g., 
ref. 11) may be interrogated. Engineers typically in- 
teract with the Testbed using simple run streams 
or through CLAMP procedures. Researchers in- 
teract using CLAMP procedures (e.g., to study 
nonlinear solution strategies) or through Fortran pro- 
cessors (e.g., to implement new element formula- 
tions). Developers interact with the entire Testbed 
architecture, including the design of the command 
language, the data handling techniques for large- 
scale analyses, and the strategy for I/O on parallel 
computers. 
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CSM Testbed Structural Analysis Features 

The CSM Testbed presently provides structural 
analysis capabilities that permit an analyst to per- 
form large-scale nonlinear stress analyses of shell- 
type structures. Three-dimensional stress analyses 
are presently limited to linear elastic orthotropic ma- 
terials. Eigenvalue problems associated with either 
linear bifurcation buckling or linear vibration analy- 
ses may also be solved. Transient dynamic analyses 
are limited to linear elastic problems with either di- 
rect time integration or mode superposition used to 
obtain the transient response. Some of the newly 
developed engineering features of the CSM Testbed 
are the equation solvers, the element library, the ma- 
terial modeling, and the solution procedures. Inter- 
face utilities to and from the PATRAN graphics sys- 
tems have been developed to support the modeling 
and analysis of large-scale structures. Access to such 
a preprocessing and postprocessing software system 
enhances the structural analyst’s ability to under- 
stand the structural behavior through visualization 
of the computed results. 

Equation Solvers 

The linear system of equations that arise in static 
structural analysis applications has the form Ku = f, 
where K is the symmetric, positive-definite stiffness 
matrix, f is the load vector, and u is the vector of gen- 
eralized displacements. Such linear systems can be 
as large as several hundred thousand degrees of free- 
dom (dof) and often require significant computing 
resources (both memory and execution time). The 
structure of the stiffness matrices in these applica- 
tions is often sparse, although in many applications 
an ordering of the nodes which minimizes the band- 
width makes banded or profile (skyline) type storage 
of the matrices practical. The choice of a particular 
method to solve Ku = f will depend on the nonzero 
structure of K and, in the case of the iterative meth- 
ods, the condition number of K. In addition, the 
architecture of the computer, particularly for mod- 
ern vector and parallel computers, influences both 
the choice and the implementation of methods used 
to solve these linear systems of equations. Ortega 
(ref. 14) presents a thorough description of these var- 
ious methods and their implementations as applied 
to vector and parallel computers. 

The data structure of the global stiffness matrix 
is a key factor in the design and implementation of 
equation solvers for the CRAY-2 architecture and the 
Testbed software (e.g., ref. 15). The generation of 
stiffness matrices is accomplished by several differ- 
ent processors producing element stiffness matrices, 
defining boundary conditions, applied loads, and or- 
dering of nodes, and assembling the global stiffness 


matrix. The stiffness matrices are stored in a nodal- 
block sparse form. The original sparse out-of-core 
Choleski solver used by the Testbed code (proces- 
sors INV and SSOL) factors and solves the stiffness 
matrices using this data structure. A major source 
of inefficiency for this equation solver on a CRAY-2 
computer system is that the operations carried out 
in factoring the stiffness matrix and solving the re- 
sulting triangular systems are carried out using these 
small nodal blocks (usually 3 x 3 or 6 x 6 in size). 
The vector length of these operations is therefore six 
or less and the code is faster when run without vector 
optimization. 

The new vectorized equation solvers (processors 
BAND and ITER) require K to be stored in one of 
several different sparse and banded storage schemes. 
Processor ITER contains three conjugate gradient it- 
erative methods. These methods vary in their types 
of preconditioning, which include diagonal scaling, 
incomplete Choleski factorization with a sparse stor- 
age scheme, and incomplete Choleski factorization 
with a diagonal storage scheme. Processor BAND 
contains three basic algorithms that are all based 
on Choleski factorization of banded matrices. The 
first algorithm uses the standard LINPACK routines 
(ref. 16) for banded solvers, namely SPBFA and 
SPBSL. The second algorithm, kji Choleski, uses 
column storage of the lower triangular part of the 
symmetric matrix to take advantage of vectors with 
a constant stride of one and loop unrolling to a depth 
or level of four. Loop unrolling reduces the number of 
memory references by holding vectors longer in the 
registers and increases the amount of vector com- 
putations within a loop. As a result, many of the 
multiplication and subtraction operations and mem- 
ory references will overlap, leading to greater perfor- 
mance. In addition, the local memory of the CRAY-2 
computer system is used to store up to four columns 
of the factored matrix to further decrease execution 
time. The third algorithm uses profile storage of the 
matrix instead of banded storage, and this type of 
storage results in a significant reduction in memory 
requirements and in the number of operations. 

The strategy used for the vectorized equation 
solvers involves four steps. First, the coefficients of 
the unconstrained stiffness matrix are read from the 
global data base into a temporary array. Second, the 
nodal constraint information and node ordering se- 
quence information are retrieved from the global data 
base. Third, the appropriate pointer arrays for the 
new storage scheme are formed. Finally, the coeffi- 
cients of K are placed in a singly dimensioned array 
and modifications are made to the right-hand side (f ) 
corresponding to any applied displacements. For 
the direct Choleski methods, an additional storage 
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scheme is included to reformat Testbed stiffness ma- 
trices into the standard LINPACK (ref. 16) banded 
storage format. The reformatting procedure is essen- 
tially sequential, but the time to reformat the ma- 
trices is small compared with the time to solve the 
equations for large problems. 

The capability to reorder the nodes automatically 
is an important part of the equation solving process 
in general-purpose finite-element codes. The struc- 
ture of the assembled stiffness matrices is determined 
by the node connectivities and the node number- 
ing scheme used in the finite-element model. Al- 
though the node connectivity is fixed by the problem 
definition and discretization, many node orderings 
are possible. The Testbed software contains proces- 
sor RSEQ, which uses four different algorithms to 
reorder nodes automatically. These algorithms are 
nested dissection, minimum degree, reverse Cuthill- 
McKee, and Gibbs-Poole-Stockmeyer. The first two 
algorithms are used by sparse solvers and minimize 
fill in the factorization process. The last two are 
profile and bandwidth minimizing routines, respec- 
tively. The direct banded solvers implemented in 
processor BAND are most efficient with node or- 
derings which minimize bandwidth, while the sparse 
out-of-core Choleski equation solver in processor INV 
is most efficient with orderings which minimize fill. 
For the various preconditioned conjugate gradient 
methods in processor ITER, the preconditioner used 
determines which ordering is best. Although the 
precise relationship between node ordering and the 
convergence rate of the incomplete Choleski conju- 
gate gradient (ICCG) is not known, preliminary re- 
sults show that the ordering of nodes can have a great 
effect on the convergence rate. In the test problems 
used with the ICCG method, the convergence rate of 
ICCG is better for the sparse, minimum-fill orderings 
than for the bandwidth-minimizing orderings. How- 
ever, in some cases, the ordering used to define the 
problem gives the best convergence rate. For the ba- 
sic conjugate gradient method, the matrix structure 
has no effect on the convergence rate but the matrix 
structure is important for the storage requirements if 
diagonal storage is used. Orderings which minimize 
bandwidth also concentrate the coefficients near the 
main diagonal, thereby minimizing the number of di- 
agonals required for matrix storage. As a result, the 
vector lengths of the diagonals are longer and the 
number of extra zeros added between nonzero coef- 
ficients is fewer; thus, the memory requirements are 
reduced and the computation speed is increased. 

Generic Element Processor Template 

The generic element processor template shown in 
figure 3 provides the element developer with a stan- 


dard outer software “shell” that handles all user- 
command input and all I/O to and from the global 
data base. In addition, a standard set of “shell- 
to-kerner interface routines (e.g., ES_K, ES_M, and 
ES_F) are provided as cover routines for the element 
developer’s “kernel” routines. The function of the 
interface routines is to perform the transformation 
between the standard argument lists of the outer soft- 
ware “shell” and those of the element developer’s per- 
sonal code. The element developer’s kernel routines 
are integrated with these interface routines through 
the convention that the interface subroutine names 
and argument lists are standardized. The indepen- 
dent structural element processors (i.e., processor 
ESz, where i = 1,2,...) are installed and readily ac- 
cessible to all CSM researchers for small benchmark 
problems as well as large-scale application problems. 


★call ES ( function = ’FORM STIFFNESS/MATL’ ; es proc = ESi) 
Procedure ES (. . . ) 



Figure 3. Generic element processor template. 

The generic element processor features a standard 
high-level procedure ES that processes user com- 
mands such as 

♦ call ES ( function = ’FORM STIFFNESS/MATL'; - 
es_proc = ES/) 

All the ESz processors are driven by a common set of 
commands through calls to the ES procedure and cre- 
ate the same data structure, regardless of how the ele- 
ment developer programmed the kernel routines (e.g., 
the element stiffness calculations and element stress 
recovery). This approach provides an extendible 
and easy-to-use vehicle for integrated finite-element 
research, development, and application within the 
CSM Testbed. 

A key feature of the generic element proces- 
sor shell is the easy access to the utilities associ- 
ated with an element-independent corotational for- 
mulation (ref. 17). Through these utilities, element 
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developers may readily attempt geometric nonlinear 
problems which exhibit large rotations. Only the 
basic element characteristics associated with linear 
strain-displacement relations are required from the 
element developer in the kernel routines. Extensions 
to include the nonlinear strain-displacement relations 
require the element developer to provide additional 
kernel routines (e.g., internal force calculations). 

Presently only two-dimensional shell elements and 
three-dimensional solid elements have been installed 
in the CSM Testbed with the generic element pro- 
cessor template. Processor ESI contains a family 
of four- and nine-node continuum-based resultant 
(CBR) quadrilateral shell elements (ref. 18). This 
family of elements includes the assumed-natural co- 
ordinate strain quadrilateral shell elements (ref. 19) 
and the Lagrangian quadrilateral shell elements 
with selectively reduced integration. Processor ES2 
contains a new hybrid curved four- node quadri- 
lateral shell element (ref. 20). Processor ES3 
contains a family of three-dimensional hybrid solid 
elements, including 8- and 20-node bricks (hexa- 
hedrons), 6- and 15-node wedges (pentahedrons), and 
4- and 10- node pyramids (tetrahedrons). Proces- 
sor ES4 contains a family of hybrid plate-shell ele- 
ments, including four-node quadrilateral and three- 
node triangular elements. Processor ES5 contains 
a displacement-based, four-node quadrilateral plate- 
shell element, denoted the 410 element, from the 
STAGSC-1 computer code (ref. 21). Additional ESz 
processors are under development. In addition, ele- 
ments in the original element library of Level 13 of 
SPAR (ref. 8) are currently still available for linear 
analyses. 

Material Modeling 

The material modeling features of the CSM 
Testbed are directed toward the analysis require- 
ments of laminated composite structures. Consti- 
tutive relations for classical and shear flexible two- 
dimensional plate and shell models as well as for 
three-dimensional solids are evaluated and available 
to the element developer or structural analyst. Pro- 
cessor LAU is a laminate analysis utility for calcu- 
lating the constitutive relations for two-dimensional 
and three-dimensional isotropic, orthotropic, and 
laminated structures. The formulation is based on 
the usual lamination theory (e.g., refs. 22 and 23) 
whereby the laminate constitutive relations are de- 
rived from the constitutive relations for each layer in 
the laminate. With the midplane strains and curva- 
tures, the in-plane strains and corresponding stresses 
in each layer of the laminate may be calculated and 
used to evaluate selected stress- and strain-based fail- 
ure criteria. The failure criteria implemented in the 


Testbed include the maximum stress criteria, the 
maximum strain criteria, and several quadratic poly- 
nomial failure criteria. 

Solution Procedures 

Various types of analysis may be performed with 
the CSM Testbed through the use of either or- 
dinary run streams which execute various proces- 
sors sequentially or CLAMP procedures which exe- 
cute directives and processors and perhaps call other 
procedures. Linear stress analyses and eigenvalue 
analyses are both performed using simple analy- 
sis run streams. Solution procedures that require 
looping and branching are more complex procedures 
than linear analysis procedures. Two sets of solu- 
tion procedures that require looping have been writ- 
ten and may be used to solve various application 
problems. The first solution procedure is named 
NEWMARK. Its function is to perform a linear tran- 
sient dynamic analysis using the well-known New- 
mark method for direct time integration of the equa- 
tions of motion. The second set of procedures is 
named NL_STATIC_1. These procedures are used to 
perform a geometric nonlinear static analysis using 
a modified Newton-Raphson algorithm with corota- 
tional updates and the Riks-linearized-Crisfield arc- 
length control strategy (refs. 24 and 25) for either 
applied force or applied displacement problems. 

Application Studies Using CSM Testbed 

Research in methods development for the CSM 
Testbed is driven in part by the analysis deficiencies 
identified in the solution of various application prob- 
lems. The LaRC CSM activity uses the concept of 
focus problems to provide a common set of structural 
analysis problems for all CSM participants. Focus 
problems may be entire aerospace vehicles or various 
subcomponents that pose difficult computational and 
structural mechanics problems. These focus prob- 
lems help guide methods research and development 
for generic classes of problems. New focus problems 
are selected as new technology evolves and computa- 
tional structural mechanics methodology develops. A 
wide range of CSM application studies are presented 
in reference 26. Problems selected for presentation 
here are the following: 

• Composite blade-stiffened panel with discon- 
tinuous stiffener 

• Circular cylindrical shell with two rectangular 
cutouts 

• Impulsively loaded truncated conical shell 

• Space Shuttle solid rocket booster 

These application studies demonstrate the struc- 
tural analysis capabilities of the CSM Testbed. The 
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analyses presented herein utilize solution procedures 
implemented through the CLAMP language as well 
as various finite elements implemented through the 
generic element processor template. The execu- 
tion times for selected CSM Testbed processors are 
compared for the various analysis problems consid- 
ered. Postprocessing of the results, including both 
deflections and stress resultants, is performed with 
PATRAN to help the analyst visualize the com- 
puted results, and examples of this capability are also 
presented. 

Composite Blade-Stiffened Panel With 
Discontinuous Stiffener 

Discontinuities and eccentricities are usually 
present in practical structures. In addition, potential 
damage of otherwise perfect structures is often an im- 
portant design consideration. Predicting the struc- 
tural response in the presence of discontinuities, ec- 
centricities, and damage is particularly difficult when 
the component is built from brittle composite ma- 
terials or is loaded into the nonlinear range. The 
nonlinear response of a flat, blade-stiffened graphite- 
epoxy panel with a discontinuous stiffener loaded in 
axial compression (fig. 4(a)) was chosen as a focus 
problem and is summarized in this section. A more 
complete discussion of this problem is presented in 
reference 26. 

This problem represents a generic class of lam- 
inated composite structures with discontinuities in 
which the interlaminar stress state becomes impor- 
tant. This problem is characterized by a discontinu- 
ity (the hole), an eccentric loading, large displace- 
ments, large stress gradients, and a brittle material 
system. The geometry and laminate properties are 
given in reference 26. The loading is uniform ax- 
ial compression. The loaded ends of the panel are 
clamped and the sides are free. 

The finite-element model is shown in figure 4(b). 
A total of 144 9-node quadrilateral shell elements 
(processor ESI) are used in the nonlinear analysis. 
This model has 628 nodes and 2910 active degrees 
of freedom. The procedure NL_STATIC_1 is used to 
perform the nonlinear analysis. 

End-shortening results are shown in figure 5 as 
a function of the applied compressive load. The 
end shortening u is normalized by the overall panel 
length L, and the applied load P is normalized by the 
panel prebuckling extensional stiffness EA obtained 
from the test. The blade-stiffened panel with a 
discontinuous stiffener was tested to failure. Local 
failures occurred prior to overall panel failure, as 
is evident from the end-shortening results shown in 
figure 5. Good agreement between test and analysis 
is shown up to the load where local failures occurred. 



L- 79-7347 
(a) Stiffened panel. 



(b) Finite-element model. 


Figure 4. Composite blade-stiffened panel with discontinuous 

stiffener. 

Oblique views of two deformed shapes with exag- 
gerated deflections are shown in figure 6 for two val- 
ues of applied compressive load. Load A corresponds 
to approximately half the value of load B (37 800 lb). 
Contour plots of the longitudinal in-plane stress re- 
sultant N x are also shown in figure 6. These N x dis- 
tributions reveal several features of the global struc- 
tural behavior of this panel. First, away from the 
discontinuity, the N x distribution in the panel skin 
is nearly uniform and approximately half the value 
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Applied load, 
P/EA 



x 



Figure 5. End-shortening results for blade-stiffened panel. 



Figure 6. Longitudinal in-plane stress resultant N x distributions. 
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of N x in the two outer blade stiffeners. Second, 
the load is diffused from the center discontinuous 
stiffener into the panel skin rapidly such that the 
center stiffener has essentially no N x load at the edge 
of the hole. Third, the N x load in the two outer 
stiffeners increases towards the center of the panel 
and, because of panel bending, is concentrated in the 
blade tips. Fourth, the N x load in the panel skin near 
the center of the panel is much greater than the N x 
load in other portions of the panel skin. 

Longitudinal in-plane stress resultant distribu- 
tions at panel midlength are shown in figure 7 as 
a function of distance from the hole. The results in- 
dicate that high in-plane stresses and a high stress 
gradient exist near the hole. As the load increases, 
both the longitudinal in-plane stress resultant and 
the stress gradient increase near the hole and the 
blade stiffeners. These high in-plane stresses and 
stress gradients coupled with the large out-of-plane 
displacements at the free edge of the hole may cause 
material nonlinearities, local failures, and/or delam- 
inations to develop. 

Computation times for the nonlinear analysis of 
the composite blade-stiffened panel with a discontin- 
uous stiffener are given in table 2 for selected Testbed 
processors. The ratio of the overall execution time 
on a VAX 11/785 computer system to the execution 
time on the NAS CRAY-2 computer system is 17.2 
for the complete nonlinear analysis of the composite 
blade-stiffened panel with a discontinuous stiffener. 
The global tangent stiffness matrix was reevaluated 
and factored 15 times, and a total of 64 iterations 
were required to predict the nonlinear structural re- 
sponse of this panel. Most of the CPU time was spent 
in processors that perform computations on the el- 
ement level such as processor ESI, which computes 
new elemental tangent stiffness matrices. The ratio 
of the execution times for the evaluation of the el- 
emental stiffness matrices is lower than the ratio of 
the overall execution time. These processors (e.g., 
ELD and ESI) have not been modified to exploit 
the features of vector computers. However, proces- 
sor INV has been modified and performs 41.8 times 
faster on the NAS CRAY-2 computer system than on 
a VAX 11/785 computer system. 

Performance results obtained with various direct 
solvers implemented in processor BAND are shown 
in table 3. Increased performance is obtained by 
using “loop unrolling” to level 4, where a column 
of K is updated by four columns at a time rather 
than one, and by also exploiting the local memory 
of the CRAY-2 computer system. For this problem, 
only the profile method in processor BAND performs 
better than processor INV. 


Circular Cylindrical Shell With Two 
Rectangular Cutouts 

A common structural configuration is that of a 
cylindrical shell (e.g., storage tanks, pipelines, air- 
craft fuselages, and rocket motor cases). Shell- type 
structures are generally sensitive to initial geomet- 
ric imperfections and to local discontinuities such 
as cutouts. Many aerospace vehicles contain large 
cutouts (e.g., access holes and windows). The 
strength of these structures is limited to the static 
collapse load. Predicting the nonlinear collapse be- 
havior of these shell structures is a difficult and com- 
putationally intensive analysis problem. 

The circular cylindrical shell with two rectangu- 
lar cutouts loaded by uniform end shortening shown 
in figure 8 is representative of this class of struc- 
tures. This problem has also been used as a bench- 
mark problem by Hartung and Ball (ref. 27) for 
shell analysis computer codes and by Almroth and 
Brogan (ref. 28) for assessing shell elements. These 
researchers considered only one-eighth of the shell in 
their analyses. The results reported herein are com- 
pared with their results, and hence only one-eighth 
of the shell is modeled. The finite-element model is 
composed of 101 9-node quadrilateral shell elements 
(processor ESI), 449 nodes, and 2012 active degrees 
of freedom, as shown in figure 9(a). 

A linear bifurcation buckling analysis was per- 
formed prior to the nonlinear collapse analysis. The 
buckling load computed in this study is 1016 lb which 
agrees with the results presented by Hartung and 
Ball (ref. 27). The buckling mode shape indicates 
that the vertical edges of the cutout buckle locally. 

The nonlinear analysis of the cylinder with 
cutouts was performed with the procedure 
NL_STATIC_1. Out-of-plane deflections w are shown 
in figure 9(b) as a function of the applied load for 
two points (denoted as “a” and “b” in fig. 9(a)). 
The elastic collapse load predicted with the Testbed 
is 2846 lb — nearly three times the linear bifurcation 
buckling load. As the out-of-plane deflections near 
the vertical edges of the cutouts develop, the com- 
pressive stresses are redistributed away from these 
regions and the load is carried by the remaining por- 
tions of the shell, as shown in figure 10. 

Hartung and Ball (ref. 27) reported a collapse 
load of 2109 lb using a finite-difference version of the 
STAGS computer code. Later, Almroth and Bro- 
gan (ref. 28), in a convergence study using the finite- 
element version of STAGS, reported a “nearly” con- 
verged collapse load of 2750 lb. Additional research 
in shell element technology is needed in order to pro- 
vide analysts with reliable structural analysis tools. 
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Table 2. Selected Processor Execution Times for Blade-Stiffened Panel 
[628 nodes; 2910 degrees of freedom; average semi-bandwidth of 439] 


Solution 

Processor 

NAS CRAY-2, 

VAX 11/785, 

phase 

name 

CPU sec 

CPU sec 


ELD 

3.2 

6.7 

Mesh generation 

E 

.3 

11.4 


TOPO 

1.6 

15.8 


ES 

50.1 

821.0 

Global stiffness 

K 

1.4 

26.8 

matrix formation 

INV 

8.3 

347.0 

and factoring 

VEC 

2.9 

18.4 


SSOL 

.9 

28.8 


SSOL 

0.9 

35.3 

Each iteration 

VEC 

2.1 

19.4 


ES 

13.5 

230.1 


Table 3. Performance of Direct Solvers in Processor BAND 
[628 nodes; 2910 degrees of freedom; average semi- bandwidth of 439] 


Method 

NAS CRAY-2, 
CPU sec 

Compute rate, 
MFLOPS 

LINPACK 

27.1 

64.1 

kji Choleski 

27.4 

63.4 

kji Choleski* 

17.7 

98.2 

kji Choleski* 

12.7 

136.9 

kji profile 

12.7 

57.1 

kji profile* 

7.9 

92.9 

kji profile* 

5.6 

129.4 


*Loop unrolling to level 4. 

*Loop unrolling to level 4 and use of local memory. 



Figure 7. Longitudinal in-plane stress resultant N x distributions at panel midlength. 
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Uniform end shortening 



Material Properties: 

E = 1 (/ psi 

v =0.3 


Figure 8. Circular cylinder with cutouts — geometry, properties, and loading. Dimensional quantities are in 
inches unless otherwise noted. 



(a) Finite-element model. 


(b) Out-of-plane deflections. 


Figure 9. Nonlinear response of cylinder with cutouts. 
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Figure 10. Deformed geometry for several load levels. 
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Computation times for the nonlinear analysis of 
the cylinder with cutouts are given in table 4 for se- 
lected Testbed processors. The ratio of the overall 
execution time on a VAX 11/785 computer system 
to the execution time on the NAS CRAY-2 computer 
system is 17.7 for the complete nonlinear analysis of 
the cylinder with cutouts. The global tangent stiff- 
ness matrix was reevaluated and factored 31 times 
and a total of 122 iterations were required to predict 
the nonlinear structural response. Most of the CPU 
time was spent in processors that perform compu- 
tations on the element level, such as processor ESI. 
The ratio of the execution times for the evaluation of 
the elemental stiffness matrices is lower than the ratio 
of the overall execution time. These processors (e.g., 
ELD and ESI) have not been modified to exploit 
the features of vector computers. However, proces- 
sor INV has been modified and performs 44.1 times 
faster on the NAS CRAY-2 computer system than on 
a VAX 11/785 computer system for this problem. 


Impulsively Loaded Truncated Conical Shell 

A number of important engineering problems are 
associated with the prediction of the response of a 
shell to high-energy, short-duration dynamic loads. 
Examples include reentry vehicles, space vehicles 
subjected to pyrotechnic separation loads, and vehi- 
cles subjected to blast or impulse environments (e.g., 
water impact). Sometimes these high-energy loads 
only generate a rapidly varying linear elastic stress 
state, but in other cases the loads may be sufficiently 
high or of sufficient duration that the structural re- 
sponse is nonlinear. 

The linear elastic transient response of the trun- 
cated conical shell subjected to an impulse load 
(initial velocity) shown in figure 11 is selected as 
representative of these transient dynamic shell anal- 
ysis problems. This problem also has been used as 
a benchmark problem by Hartung and Ball (ref. 27). 
The finite-element model is composed of 540 4-node 
quadrilateral elements, 589 nodes, and 2569 active 
degrees of freedom. The predicted transient response 
shown in figure 12 for the normal deflections at two 
points on the shell correlates well with the results 
presented in reference 26. Both points are located 
6.5 in. from the clamped small-diameter edge, one at 
9 = 0 ° (point a) and one at 9 = 180° (point b). The 
transient response was calculated for 1400 /isec us- 
ing the Newmark method with a time step of 2 /isec. 
Oblique views of the deformed shape with exagger- 
ated deflections from the transient analysis at various 
points in time T are shown in figure 13. 


Space Shuttle Solid Rocket Booster 

The basic elements of the Space Shuttle system 
are the orbiter, the external tank (ET), and the 
two reusable solid rocket boosters (SRB's). The 
SRB’s provide the primary shuttle ascent boost for 
the first 2 minutes of flight with an assist from the 
three Space Shuttle main engines (SSME’s) on the 
orbiter. A major subsystem of the SRB is the solid 
rocket motor (SRM), which consists of four lined, 
insulated rocket motor segments. These segments are 
connected with pinned tang-clevis joints. Each SRB 
is approximately 144 ft long and 12 ft in diameter. 

The linear elastic static analysis of the SRB 
loaded by internal pressure was analyzed as repre- 
sentative of a large-scale structural analysis problem 
that is critical to NASA. The finite-element model 
shown in figure 14 involves 9205 nodes with 1273 
2-node beam elements, 90 3-node triangular ele- 
ments, and 9156 4-node quadrilateral elements. Al- 
though the finite-element model involves 54 870 de- 
grees of freedom, it does not have the fidelity 
necessary to determine detailed stress distributions 
in particular SRB subsystems. In this global shell 
model, the field and factory joints are modeled with 
equivalent stiffness joints rather than detailed mod- 
els of the joint. As such, local joint behavior cannot 
be obtained from this global model. 

The linear stress analysis considered herein in- 
volves only the loading case of a uniform SRM in- 
ternal pressure of 1000 psi. An oblique view of the 
deformed geometry with exaggerated deflections is 
shown in figure 15. The deflection pattern exhibits 
a “pressure pillowing” behavior in the vicinity of the 
joints. The influence of the partial (270°) SRB/ET 
attachment ring on the SRB shell response is shown 
in figure 16. An abrupt change in the deflection 
pattern near the ends of the ET attachment ring is 
exhibited. 

Computation times for the SRB global shell anal- 
ysis are given in table 5 for selected Testbed proces- 
sors. The ratio of the overall execution time on a 
VAX 11/785 computer system to the execution time 
on the NAS CRAY-2 computer system is 35.6 for one 
linear stress analysis of the SRB global shell model. 
Most of the CPU time was spent in processor INV 
factoring the global stiffness matrix, a process which 
is 63.7 times faster on the NAS CRAY-2 computer 
system than on a VAX 11/785 computer system. 
However, several other processors (ELD, EKS, and 
TOPO) also used a sizeable amount of CPU time. 
All these processors need to be studied and improved 
for large-scale analysis problems. 

The new Testbed equation solvers implemented 
in processors BAND and ITER have also been 
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Table 4. Selected Processor Execution Times for Cylinder With Cutouts 
[449 nodes; 2012 degrees of freedom] 


Solution 

Processor 

NAS CRAY-2, 

VAX 11/785, 

phase 

name 

CPU sec 

CPU sec 


ELD 

4.8 

5.3 

Mesh generation 

E 

.3 

9.1 


TOPO 

1.7 

14.7 


ES 

33.4 

549.7 

Global stiffness 

K 

1.0 

18.3 

matrix formation 

INV 

9.8 

432.2 

and factoring 

VEC 

3.9 

15.5 


SSOL 

.8 

20.2 


SSOL 

0.8 

20.6 

Each iteration 

VEC 

1.9 

8.0 


ES 

8.8 

125.3 



-4 2 4 

p = 1.88 x 10 lb-sec /in 

Figure 11. Truncated conical shell -geometry, properties, and loading. Dimensional quantities are in inches unless otherwise 
noted. 



Time, psec 

Figure 12. Normal deflections at points a and b on truncated conical shell. 
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T = 800psec 


T = 1000 psec 


T = 1200 |isec 



Figure 13. Deformed shapes for truncated conical 




shell during transient analysis. 


Figure 14. Finite-element model of SRB. 




Figure 15. Deformed geometry plot of global SRB shell model. 
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Figure 16. Close-up of undeformed and deformed geometries at SRB/ET interface. 


Table 5. Selected Processor Execution Times for SRB Global Model 
[9205 nodes; 54 870 degrees of freedom; average semi-bandwidth of 382] 


Processor 

name 

NAS CRAY-2. 
CPU sec 

VAX 11/785, 
CPU sec 

ELD 

248.6 

460.8 

E 

2.0 

70.7 

EKS 

168.3 

1625.0 

TOPO 

94.3 

1678.4 

K 

46.9 

472.3 

INV 

804.1 

51185.1 

SSOL 

17.2 

295.6 
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applied to this problem. Using the skyline method 
in processor BAND with loop unrolling to level 4 
and exploiting local memory results in the solution 
time to factor and solve being 74.8 CPU sec on the 
NAS CRAY-2 computer system (a compute rate of 
127.9 MFLOPS). Processor BAND factors the global 
stiffness matrix in less than one-tenth the CPU time 
required by processor INV on the NAS CRAY-2 com- 
puter system. With use of the incomplete Choleski 
conjugate gradient method with a sparse storage 
scheme, the solution is obtained after 562 iterations. 
The solution time is 455 sec which corresponds to a 
computation rate of 20 MFLOPS. 

CSM Research Directions 

The broad objective of the CSM activity is to 
develop advanced structural analysis and computa- 
tional methods that exploit modern and emerging 
scientific computers, such as computers having vector 
and/or parallel processing capabilities. The evolv- 
ing computational environment (both hardware and 
software) is providing new opportunities to structural 
analysts that enable them to study the structural be- 
havior of complex nonlinear systems. 

Command Language Enhancements 

The CLIP enhancements include the implemen- 
tation of a table-driven parser and lexical analyzer. 
The UNIX utilities LEX and YACC are being used to 
implement an easily extendable language. This lan- 
guage will be primarily the CLAMP language with 
modifications to remove context sensitive constructs 
from the language. As a side benefit, the resulting 
interpreter should be more efficient and maintainable 
and should provide the required extendability. This 
extendability will be tested by the addition of lan- 
guage directives to control processor-task allocation 
and synchronization at a high level through CLAMP 
directives. The resulting capability should provide 
a convenient research environment for the structural 
analyst to investigate parallelism without relying on 
computer-dependent coding. 

Parallel Data Management 

In addition to providing machine-independent 
control of parallel mechanisms, it will be necessary 
for the Testbed to provide machine-independent par- 
allel data management. The data management com- 
ponents for engineering applications are reasonably 
well understood for sequential computers. However, 
multiple-instruction-multiple-data (MIMD) comput- 
ers are a different matter. These new architecture 
computers, with the nondeterministic nature of pro- 
cesses running in parallel, create new requirements 
for maintaining data integrity across processors. For 


example, if data are written by processor one and 
will be needed by processor two, a mechanism must 
exist to ensure that processor one has written the 
data before processor two successfully returns from a 
read operation on that same data. The simple solu- 
tion of blocking all subsequent I/O operations until 
processor one is complete must be avoided because 
that solution would eliminate the advantages offered 
by MIMD computers in the first place. In addition, 
if the structural analyst is to benefit from the capa- 
bility of parallel I/O on the MIMD computers, the 
implementation details and internal workings of such 
a system must be hidden in a methods development 
environment such as the CSM Testbed. 

Advanced Numerical Algorithms 

To complement the transition to MIMD comput- 
ers, numerical analysts are preparing a wealth of new 
algorithms designed to take advantage of the vector 
processing capability offered by many modern com- 
puters. In the past, the sparse nature of the matrices 
that dominate the structural analysis task has made 
vector processors of limited use. It is anticipated that 
work will continue on the development of numerical 
algorithms that will take full advantage of both the 
vector capabilities and the MIMD capabilities of fu- 
ture computer architectures. Such algorithms will 
be developed within the Testbed framework and will 
be evaluated on challenging structural analysis focus 
problems. 

Structural Analysis Technology 

The LaRC CSM activity for structural analysis 
technology is currently focused on methodology for 
predicting the nonlinear structural response of large- 
scale composite primary aircraft structures. Many of 
the structural analysis software systems available to- 
day can predict the nonlinear structural response of 
composite components. However, the lack of progres- 
sive failure analysis techniques in large-scale struc- 
tural analysis systems limits the analyst in the design 
of composite aerospace structures. A capability to 
model and analyze damaged composite structures is 
needed in the aerospace community. In addition, de- 
signers need analysis tools that can be used to assess 
the sensitivity of variations in material properties or 
loads on selected response parameters for complex 
structural systems. Finally, error sensing and con- 
trol strategies for finite-element solutions are needed 
in order to provide quantitative as well as qualitative 
information about the quality of the results from such 
calculations. 
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Summary 

The computational structural mechanics (CSM) 
Testbed is a powerful methods development en- 
vironment for developing structural analysis and 
computational methods. With enhancements and 
extensions for multiple-instruction multiple-data 
(MIMD) computers, the Testbed should continue to 
be a useful research environment for the forseeable 
future. It is currently being used by researchers 
developing structural analysis methods and numer- 
ical algorithms and evaluating MIMD I/O strate- 
gies. The Testbed application environment provides 
the mechanism to allow researchers concentrating on 
different parts of the structural analysis problem to 
communicate on solutions to problems directly re- 
lated to current NASA needs. The transfer of tech- 
nology among researchers in structural engineering, 
computer science, and numerical analysis can now 
be accomplished more effectively than was previously 
possible. 

An overview of the CSM activity at the NASA 
Langley Research Center has been presented. The 
CSM Testbed software system serves as a framework 
for structural analysis and computational methods 
research for high-performance computers. The CSM 
Testbed has been described and its use demonstrated 
through solution of selected structural analysis prob- 
lems. Future directions for CSM research using the 
Testbed have been outlined. These future develop- 
ments will take full advantage of both vector proces- 
sors and parallel methods on the NAS CRAY-2 com- 
puter system and on anticipated supercomputers of 
the 1990’s. 

Appendix 

CRAY-2 Implementation 

The source code for the Testbed, both archi- 
tecture and application modules, is maintained in 
text files with embedded preprocessing commands 
which allow selective conditional precompilation by 
machine- independent utility programs. The archi- 
tecture code is made up of approximately 650 mod- 
ules totaling about 83 000 lines of source code and 
include files. The application code is made up of 
approximately 1300 modules with about 95 000 lines 
in source code and include files. Distribution of the 
code in the UNIX environment is accomplished with 
the UNIX utility TAR to package the source code, 
makefiles, and scripts in a single file; this distri- 
bution file occupies approximately 8 megabytes of 
disk space. Installation of the Testbed on the NAS 
CRAY-2 computer was accomplished over a period 
of about 1 month in 1987 shortly after the computer 


was made available to LaRC users. The Testbed soft- 
ware system was the largest software system to be 
ported to the NAS CRAY-2 computer system, and 
consequently many problems which had not been ex- 
perienced by other users had to be diagnosed and 
overcome. 

Compilation Problems 

Because the Testbed Fortran code uses character 
variables heavily, the CFT77 compiler had to be used 
for compilation. Most problems encountered with 
this compiler were related to its handling of charac- 
ter variables and formatting of output. These prob- 
lems were encountered only at execution time. Most 
of these were resolved by inserting code blocks for 
the CRAY/UNICOS version into the master source 
files so that the modifications could be carried along 
into future versions of the code. The porting of the 
Testbed to the CRAY-2 computer system was ac- 
complished using a very early version of CFT77 un- 
der UNICOS. Although several errors in the compiler 
were discovered, these errors could be worked around 
easily. These errors in the compiler have been cor- 
rected in subsequent releases of the CFT77 compiler. 

Fortran-C Interface 

One problem related to CFT77 character han- 
dling which had to be resolved twice (once under 
UNICOS 1.0 and 2.0 and again under UNICOS 3.0) 
was the difference in data structures for CFT77 char- 
acter arguments and C compiler character string ar- 
guments. This problem arises where the Fortran 
code for the data management function calls low- 
level C language I/O functions. In this respect, the 
CFT77 compiler does not conform to the same stan- 
dard as the Fortran compilers on other UNIX sys- 
tems. To overcome the problem, in the C functions a 
C structure was defined to correspond to the CFT77 
character argument; upon entry to the C function, 
a transformation was performed from the argument 
structure to a C character string. This structure was 
initially defined to be compatible with the compilers 
used under UNICOS 1.0 and 2.0. When version 3.0 of 
UNICOS was installed with a new CFT77 compiler, 
the CFT77 character variable structure was changed 
without documentation, so the C functions had to be 
modified to accommodate the new structure once the 
problem was identified. 

Loader Problems 

The initial installation procedures used the LD 
loader for linking the executable file. When the op- 
timization options for the CFT77 compiler had been 
used in compilation, all subroutine argument ad- 
dresses and some temporary variables were defined in 
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local memory by the compiler. The LD loader con- 
catenates the local memory segments for all modules, 
so attempting to link all the application modules and 
libraries with the macroprocessor resulted in over- 
flow of local memory (40 000s words) and failure of 
the load. The LMSTAK utility to enable overlaying 
local memory segments was used, but the resulting 
program would not execute. In order to check the 
operation of the software before resolving the local 
memory overflow problem, all the code was recom- 
piled without optimization, linked successfully, and 
tested. 

Later, following a suggestion by the NAS ana- 
lysts, the segmentation loader (SEGLDR) was used. 
This loader performed the local memory overlay cor- 
rectly, so the optimized object code could be used. 
No execution errors were encountered as a result 
of using the optimizing compiler. Performance was 
improved by a factor of 3 in CPU usage with the op- 
timized code for most of the demonstration problems 
executed. Installation of a new CFT77 compiler with 
options to enable the user to control the allocation of 
local memory has since eliminated the requirement to 
use SEGLDR for the Testbed to overlay local mem- 
ory. However, vectorization is not used efficiently 
in this version of the code because of short vector 
lengths actually used (<6 in a critical area). Much 
greater improvements should be gained by tailoring 
the matrix operations in the code to take advantage 
of vectorization. 

Optimization 

In order to identify the most promising areas for 
performance improvement, two utilities were used. 
First, the UNICOS utility FLOW was used, after re- 
compilation of the code with the CFT77 flowtrace 
(-ef) option. The resulting executable file was exe- 
cuted with several demonstration problems perform- 
ing different types of analysis functions. The FLOW 
utility analyzed the output files and identified the 
modules which were using most of the CPU time for 
the executions. A calling tree diagram was also ob- 
tained in the FLOW output, which was helpful in 
analyzing the execution path of the program. 

After identifying the biggest CPU users, the For- 
tran source code for those modules was sent to an 
IRIS workstation on which the FORGE software was 
installed. FORGE is a software system for optimiz- 
ing Fortran programs for CRAY computer systems. 
FORGE attempts to exploit many of the intricate de- 
tails of CRAY hardware and software and to restruc- 
ture the Fortran programs for faster execution on 
CRAY computer systems. FORGE was used to insert 
timing function calls into the modules, which were 
then sent back to the CRAY computer, compiled, 


and linked into the executable file. The demonstra- 
tion problems were executed again and very detailed 
analyses of the execution of the modules of interest 
were obtained. These analyses led to replacement of 
some code with UNICOS library function calls and 
some other minor revisions. This work resulted in an 
improvement of about 12 percent in the performance 
of the affected analyses. 

NASA Langley Research Center 
Hampton, VA 23665-5225 
February 3, 1989 
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